Búsqueda | Portal Regional de la BVS

1.

Weighting Low-Intensity MS/MS Ions and m/z Frequency for Spectral Library Annotation.

Engler Hart, Chloe; Kind, Tobias; Dorrestein, Pieter C; Healey, David; Domingo-Fernández, Daniel.

J Am Soc Mass Spectrom ; 35(2): 266-274, 2024 Feb 07.

Artículo en Inglés | MEDLINE | ID: mdl-38271611

RESUMEN

Calculating spectral similarity is a fundamental step in MS/MS data analysis in untargeted metabolomics experiments, as it facilitates the identification of related spectra and the annotation of compounds. To improve matching accuracy when querying an experimental mass spectrum against a spectral library, previous approaches have proposed increasing peak intensities for high m/z ranges. These high m/z values tend to be smaller in magnitude, yet they offer more crucial information for identifying the chemical structure. Here, we evaluate the impact of using these weights for identifying structurally related compounds and mass spectral library searches. Additionally, we propose a weighting approach that (i) takes into account the frequency of the m/z values within a spectral library in order to assign higher importance to the most common peaks and (ii) increases the intensity of lower peaks, similar to previous approaches. To demonstrate our approach, we applied weighting preprocessing to modified cosine, entropy, and fidelity distance metrics and benchmarked it against previously reported weights. Our results demonstrate how weighting-based preprocessing can assist in annotating the structure of unknown spectra as well as identifying structurally similar compounds. Finally, we examined scenarios in which the utilization of weights resulted in diminished performance, pinpointing spectral features where the application of weights might be detrimental.

Asunto(s)

Metabolómica , Espectrometría de Masas en Tándem , Metabolómica/métodos , Iones

2.

Multi-ontology embeddings approach on human-aligned multi-ontologies representation for gene-disease associations prediction.

Wang, Yihao; Wegner, Philipp; Domingo-Fernández, Daniel; Tom Kodamullil, Alpha.

Heliyon ; 9(11): e21502, 2023 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-38027969

RESUMEN

Objectives: Knowledge graphs and ontologies in the biomedical domain provide rich contextual knowledge for a variety of challenges. Employing that for knowledge-driven NLP tasks such as gene-disease association prediction represents a promising way to increase the predictive power of a model. Methods: We investigated the power of infusing the embedding of two aligned ontologies as prior knowledge to the NLP models. We evaluated the performance of different models on some large-scale gene-disease association datasets and compared it with a model without incorporating contextualized knowledge (BERT). Results: The experiments demonstrated that the knowledge-infused model slightly outperforms BERT by creating a small number of bridges. Thus, indicating that incorporating cross-references across ontologies can enhance the performance of base models without the need for more complex and costly training. However, further research is needed to explore the generalizability of the model. We expected that adding more bridges would bring further improvement based on the trend we observed in the experiments. In addition, the use of state-of-the-art knowledge graph embedding methods on a joint graph from connecting OGG and DOID with bridges also yielded promising results. Conclusion: Our work shows that allowing language models to leverage structured knowledge from ontologies does come with clear advantages in the performance. Besides, the annotation stage brought out in this paper is constrained in reasonable complexity.

3.

Exploring the known chemical space of the plant kingdom: insights into taxonomic patterns, knowledge gaps, and bioactive regions.

Domingo-Fernández, Daniel; Gadiya, Yojana; Mubeen, Sarah; Healey, David; Norman, Bryan H; Colluru, Viswa.

J Cheminform ; 15(1): 107, 2023 Nov 10.

Artículo en Inglés | MEDLINE | ID: mdl-37950325

RESUMEN

Plants are one of the primary sources of natural products for drug development. However, despite centuries of research, only a limited region of the phytochemical space has been studied. To understand the scope of what is explored versus unexplored in the phytochemical space, we begin by reconstructing the known chemical space of the plant kingdom, mapping the distribution of secondary metabolites, chemical classes, and plants traditionally used for medicinal purposes (i.e., medicinal plants) across various levels of the taxonomy. We identify hotspot taxonomic clades occupied by a large proportion of medicinal plants and characterized secondary metabolites, as well as clades requiring further characterization with regard to their chemical composition. In a complementary analysis, we build a chemotaxonomy which has a high level of concordance with the taxonomy at the genus level, highlighting the close relationship between chemical profiles and evolutionary relationships within the plant kingdom. Next, we delve into regions of the phytochemical space with known bioactivity that have been used in modern drug discovery. While we find that the vast majority of approved drugs from phytochemicals are derived from known medicinal plants, we also show that medicinal and non-medicinal plants do not occupy distinct regions of the known phytochemical landscape and their phytochemicals exhibit properties similar to bioactive compounds. Moreover, we also reveal that only a few thousand phytochemicals have been screened for bioactivity and that there are hundreds of known bioactive compounds present in both medicinal and non-medicinal plants, suggesting that non-medicinal plants also have potential therapeutic applications. Overall, these results support the hypothesis that there are many plants with medicinal properties awaiting discovery.

4.

MultiGML: Multimodal graph machine learning for prediction of adverse drug events.

Krix, Sophia; DeLong, Lauren Nicole; Madan, Sumit; Domingo-Fernández, Daniel; Ahmad, Ashar; Gul, Sheraz; Zaliani, Andrea; Fröhlich, Holger.

Heliyon ; 9(9): e19441, 2023 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-37681175

RESUMEN

Adverse drug events constitute a major challenge for the success of clinical trials. Several computational strategies have been suggested to estimate the risk of adverse drug events in preclinical drug development. While these approaches have demonstrated high utility in practice, they are at the same time limited to specific information sources. Thus, many current computational approaches neglect a wealth of information which results from the integration of different data sources, such as biological protein function, gene expression, chemical compound structure, cell-based imaging and others. In this work we propose an integrative and explainable multi-modal Graph Machine Learning approach (MultiGML), which fuses knowledge graphs with multiple further data modalities to predict drug related adverse events and general drug target-phenotype associations. MultiGML demonstrates excellent prediction performance compared to alternative algorithms, including various traditional knowledge graph embedding techniques. MultiGML distinguishes itself from alternative techniques by providing in-depth explanations of model predictions, which point towards biological mechanisms associated with predictions of an adverse drug event. Hence, MultiGML could be a versatile tool to support decision making in preclinical drug development.

5.

Modern drug discovery using ethnobotany: A large-scale cross-cultural analysis of traditional medicine reveals common therapeutic uses.

Domingo-Fernández, Daniel; Gadiya, Yojana; Mubeen, Sarah; Bollerman, Thomas Joseph; Healy, Matthew D; Chanana, Shaurya; Sadovsky, Rotem Gura; Healey, David; Colluru, Viswa.

iScience ; 26(9): 107729, 2023 Sep 15.

Artículo en Inglés | MEDLINE | ID: mdl-37701812

RESUMEN

For millennia, numerous cultures and civilizations have relied on traditional remedies derived from plants to treat a wide range of conditions and ailments. Here, we systematically analyzed ethnobotanical patterns across taxonomically related plants, demonstrating that congeneric medicinal plants are more likely to be used for treating similar indications. Next, we reconstructed the phytochemical space covered by medicinal plants to reveal that (i) taxonomically related medicinal plants cover a similar phytochemical space, and (ii) chemical similarity correlates with similar therapeutic usage. Lastly, we present several case scenarios illustrating how mining this information can be used for drug discovery applications, including: (i) investigating taxonomic hotspots around particular indications, (ii) exploring shared patterns of congeneric plants located in different geographic areas, but which have been used to treat the same indications, and (iii) showing the concordance between ethnobotanical patterns among non-taxonomically related plants and the presence of shared bioactive phytochemicals.

6.

On the correspondence between the transcriptomic response of a compound and its effects on its targets.

Engler Hart, Chloe; Ence, Daniel; Healey, David; Domingo-Fernández, Daniel.

BMC Bioinformatics ; 24(1): 207, 2023 May 19.

Artículo en Inglés | MEDLINE | ID: mdl-37208587

RESUMEN

Better understanding the transcriptomic response produced by a compound perturbing its targets can shed light on the underlying biological processes regulated by the compound. However, establishing the relationship between the induced transcriptomic response and the target of a compound is non-trivial, partly because targets are rarely differentially expressed. Therefore, connecting both modalities requires orthogonal information (e.g., pathway or functional information). Here, we present a comprehensive study aimed at exploring this relationship by leveraging thousands of transcriptomic experiments and target data for over 2000 compounds. Firstly, we confirm that compound-target information does not correlate as expected with the transcriptomic signatures induced by a compound. However, we reveal how the concordance between both modalities increases by connecting pathway and target information. Additionally, we investigate whether compounds that target the same proteins induce a similar transcriptomic response and conversely, whether compounds with similar transcriptomic responses share the same target proteins. While our findings suggest that this is generally not the case, we did observe that compounds with similar transcriptomic profiles are more likely to share at least one protein target and common therapeutic applications. Finally, we demonstrate how to exploit the relationship between both modalities for mechanism of action deconvolution by presenting a case scenario involving a few compound pairs with high similarity.

Asunto(s)

Perfilación de la Expresión Génica , Transcriptoma , Proteínas

7.

Integrative analysis to identify shared mechanisms between schizophrenia and bipolar disorder and their comorbidities.

Bharadhwaj, Vinay Srinivas; Mubeen, Sarah; Sargsyan, Astghik; Jose, Geena Mariya; Geissler, Stefan; Hofmann-Apitius, Martin; Domingo-Fernández, Daniel; Kodamullil, Alpha Tom.

Prog Neuropsychopharmacol Biol Psychiatry ; 122: 110688, 2023 03 02.

Artículo en Inglés | MEDLINE | ID: mdl-36462601

RESUMEN

Schizophrenia and bipolar disorder are characterized by highly similar neuropsychological signatures, implying shared neurobiological mechanisms between these two disorders. These disorders also have comorbidities, such as type 2 diabetes mellitus (T2DM). To date, an understanding of the mechanisms that mediate the link between these two disorders remains incomplete. In this work, we identify and investigate shared patterns across multiple schizophrenia, bipolar disorder and T2DM gene expression datasets through multiple strategies. Firstly, we investigate dysregulation patterns at the gene-level and compare our findings against disease-specific knowledge graphs (KGs). Secondly, we analyze the concordance of co-expression patterns across datasets to identify disease-specific as well as common pathways. Thirdly, we examine enriched pathways across datasets and disorders to identify common biological mechanisms between them. Lastly, we investigate the correspondence of shared genetic variants between these two disorders and T2DM as well as the disease-specific KGs. In conclusion, our work reveals several shared candidate genes and pathways, particularly those related to the immune system, such as TNF signaling pathway, IL-17 signaling pathway and NF-kappa B signaling pathway and nervous system, such as dopaminergic synapse and GABAergic synapse, which we propose mediate the link between schizophrenia and bipolar disorder and its shared comorbidity, T2DM.

Asunto(s)

Trastorno Bipolar , Diabetes Mellitus Tipo 2 , Esquizofrenia , Humanos , Trastorno Bipolar/psicología , Esquizofrenia/epidemiología , Esquizofrenia/genética , Comorbilidad , Transducción de Señal

8.

Unifying the identification of biomedical entities with the Bioregistry.

Hoyt, Charles Tapley; Balk, Meghan; Callahan, Tiffany J; Domingo-Fernández, Daniel; Haendel, Melissa A; Hegde, Harshad B; Himmelstein, Daniel S; Karis, Klas; Kunze, John; Lubiana, Tiago; Matentzoglu, Nicolas; McMurry, Julie; Moxon, Sierra; Mungall, Christopher J; Rutz, Adriano; Unni, Deepak R; Willighagen, Egon; Winston, Donald; Gyori, Benjamin M.

Sci Data ; 9(1): 714, 2022 11 19.

Artículo en Inglés | MEDLINE | ID: mdl-36402838

RESUMEN

The standardized identification of biomedical entities is a cornerstone of interoperability, reuse, and data integration in the life sciences. Several registries have been developed to catalog resources maintaining identifiers for biomedical entities such as small molecules, proteins, cell lines, and clinical trials. However, existing registries have struggled to provide sufficient coverage and metadata standards that meet the evolving needs of modern life sciences researchers. Here, we introduce the Bioregistry, an integrative, open, community-driven metaregistry that synthesizes and substantially expands upon 23 existing registries. The Bioregistry addresses the need for a sustainable registry by leveraging public infrastructure and automation, and employing a progressive governance model centered around open code and open data to foster community contribution. The Bioregistry can be used to support the standardized annotation of data, models, ontologies, and scientific literature, thereby promoting their interoperability and reuse. The Bioregistry can be accessed through https://bioregistry.io and its source code and data are available under the MIT and CC0 Licenses at https://github.com/biopragmatics/bioregistry .

9.

Ensembles of knowledge graph embedding models improve predictions for drug discovery.

Rivas-Barragan, Daniel; Domingo-Fernández, Daniel; Gadiya, Yojana; Healey, David.

Brief Bioinform ; 23(6)2022 11 19.

Artículo en Inglés | MEDLINE | ID: mdl-36384050

RESUMEN

Recent advances in Knowledge Graphs (KGs) and Knowledge Graph Embedding Models (KGEMs) have led to their adoption in a broad range of fields and applications. The current publishing system in machine learning requires newly introduced KGEMs to achieve state-of-the-art performance, surpassing at least one benchmark in order to be published. Despite this, dozens of novel architectures are published every year, making it challenging for users, even within the field, to deduce the most suitable configuration for a given application. A typical biomedical application of KGEMs is drug-disease prediction in the context of drug discovery, in which a KGEM is trained to predict triples linking drugs and diseases. These predictions can be later tested in clinical trials following extensive experimental validation. However, given the infeasibility of evaluating each of these predictions and that only a minimal number of candidates can be experimentally tested, models that yield higher precision on the top prioritized triples are preferred. In this paper, we apply the concept of ensemble learning on KGEMs for drug discovery to assess whether combining the predictions of several models can lead to an overall improvement in predictive performance. First, we trained and benchmarked 10 KGEMs to predict drug-disease triples on two independent biomedical KGs designed for drug discovery. Following, we applied different ensemble methods that aggregate the predictions of these models by leveraging the distribution or the position of the predicted triple scores. We then demonstrate how the ensemble models can achieve better results than the original KGEMs by benchmarking the precision (i.e., number of true positives prioritized) of their top predictions. Lastly, we released the source code presented in this work at https://github.com/enveda/kgem-ensembles-in-drug-discovery.

Asunto(s)

Descubrimiento de Drogas , Reconocimiento de Normas Patrones Automatizadas , Conocimiento , Aprendizaje Automático , Programas Informáticos

10.

Exploring the Complex Network of Heme-Triggered Effects on the Blood Coagulation System.

Mubeen, Sarah; Domingo-Fernández, Daniel; Díaz Del Ser, Sara; Solanki, Dhwani M; Kodamullil, Alpha T; Hofmann-Apitius, Martin; Hopp, Marie-T; Imhof, Diana.

J Clin Med ; 11(19)2022 Oct 10.

Artículo en Inglés | MEDLINE | ID: mdl-36233841

RESUMEN

Excess labile heme, occurring under hemolytic conditions, displays a versatile modulator in the blood coagulation system. As such, heme provokes prothrombotic states, either by binding to plasma proteins or through interaction with participating cell types. However, despite several independent reports on these effects, apparently contradictory observations and significant knowledge gaps characterize this relationship, which hampers a complete understanding of heme-driven coagulopathies and the development of suitable and specific treatment options. Thus, the computational exploration of the complex network of heme-triggered effects in the blood coagulation system is presented herein. Combining hemostasis- and heme-specific terminology, the knowledge available thus far was curated and modeled in a mechanistic interactome. Further, these data were incorporated in the earlier established heme knowledge graph, "HemeKG", to better comprehend the knowledge surrounding heme biology. Finally, a pathway enrichment analysis of these data provided deep insights into so far unknown links and novel experimental targets within the blood coagulation cascade and platelet activation pathways for further investigation of the prothrombotic nature of heme. In summary, this study allows, for the first time, a detailed network analysis of the effects of heme in the blood coagulation system.

11.

Elucidating gene expression patterns across multiple biological contexts through a large-scale investigation of transcriptomic datasets.

Figueiredo, Rebeca Queiroz; Del Ser, Sara Díaz; Raschka, Tamara; Hofmann-Apitius, Martin; Kodamullil, Alpha Tom; Mubeen, Sarah; Domingo-Fernández, Daniel.

BMC Bioinformatics ; 23(1): 231, 2022 Jun 15.

Artículo en Inglés | MEDLINE | ID: mdl-35705903

RESUMEN

Distinct gene expression patterns within cells are foundational for the diversity of functions and unique characteristics observed in specific contexts, such as human tissues and cell types. Though some biological processes commonly occur across contexts, by harnessing the vast amounts of available gene expression data, we can decipher the processes that are unique to a specific context. Therefore, with the goal of developing a portrait of context-specific patterns to better elucidate how they govern distinct biological processes, this work presents a large-scale exploration of transcriptomic signatures across three different contexts (i.e., tissues, cell types, and cell lines) by leveraging over 600 gene expression datasets categorized into 98 subcontexts. The strongest pairwise correlations between genes from these subcontexts are used for the construction of co-expression networks. Using a network-based approach, we then pinpoint patterns that are unique and common across these subcontexts. First, we focused on patterns at the level of individual nodes and evaluated their functional roles using a human protein-protein interactome as a referential network. Next, within each context, we systematically overlaid the co-expression networks to identify specific and shared correlations as well as relations already described in scientific literature. Additionally, in a pathway-level analysis, we overlaid node and edge sets from co-expression networks against pathway knowledge to identify biological processes that are related to specific subcontexts or groups of them. Finally, we have released our data and scripts at https://zenodo.org/record/5831786 and https://github.com/ContNeXt/ , respectively and developed ContNeXt ( https://contnext.scai.fraunhofer.de/ ), a web application to explore the networks generated in this work.

Asunto(s)

Redes Reguladoras de Genes , Transcriptoma , Perfilación de la Expresión Génica , Humanos , Programas Informáticos

12.

Integrative data semantics through a model-enabled data stewardship.

Wegner, Philipp; Schaaf, Sebastian; Uebachs, Mischa; Domingo-Fernández, Daniel; Salimi, Yasamin; Gebel, Stephan; Sargsyan, Astghik; Birkenbihl, Colin; Springstubbe, Stephan; Klockgether, Thomas; Fluck, Juliane; Hofmann-Apitius, Martin; Kodamullil, Alpha Tom.

Bioinformatics ; 38(15): 3850-3852, 2022 08 02.

Artículo en Inglés | MEDLINE | ID: mdl-35652780

RESUMEN

MOTIVATION: The importance of clinical data in understanding the pathophysiology of complex disorders has prompted the launch of multiple initiatives designed to generate patient-level data from various modalities. While these studies can reveal important findings relevant to the disease, each study captures different yet complementary aspects and modalities which, when combined, generate a more comprehensive picture of disease etiology. However, achieving this requires a global integration of data across studies, which proves to be challenging given the lack of interoperability of cohort datasets. RESULTS: Here, we present the Data Steward Tool (DST), an application that allows for semi-automatic semantic integration of clinical data into ontologies and global data models and data standards. We demonstrate the applicability of the tool in the field of dementia research by establishing a Clinical Data Model (CDM) in this domain. The CDM currently consists of 277 common variables covering demographics (e.g. age and gender), diagnostics, neuropsychological tests and biomarker measurements. The DST combined with this disease-specific data model shows how interoperability between multiple, heterogeneous dementia datasets can be achieved. AVAILABILITY AND IMPLEMENTATION: The DST source code and Docker images are respectively available at https://github.com/SCAI-BIO/data-steward and https://hub.docker.com/r/phwegner/data-steward. Furthermore, the DST is hosted at https://data-steward.bio.scai.fraunhofer.de/data-steward. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Demencia , Semántica , Humanos , Programas Informáticos , Demencia/diagnóstico

13.

ADataViewer: exploring semantically harmonized Alzheimer's disease cohort datasets.

Salimi, Yasamin; Domingo-Fernández, Daniel; Bobis-Álvarez, Carlos; Hofmann-Apitius, Martin; Birkenbihl, Colin.

Alzheimers Res Ther ; 14(1): 69, 2022 05 21.

Artículo en Inglés | MEDLINE | ID: mdl-35598021

RESUMEN

BACKGROUND: Currently, Alzheimer's disease (AD) cohort datasets are difficult to find and lack across-cohort interoperability, and the actual content of publicly available datasets often only becomes clear to third-party researchers once data access has been granted. These aspects severely hinder the advancement of AD research through emerging data-driven approaches such as machine learning and artificial intelligence and bias current data-driven findings towards the few commonly used, well-explored AD cohorts. To achieve robust and generalizable results, validation across multiple datasets is crucial. METHODS: We accessed and systematically investigated the content of 20 major AD cohort datasets at the data level. Both, a medical professional and a data specialist, manually curated and semantically harmonized the acquired datasets. Finally, we developed a platform that displays vital information about the available datasets. RESULTS: Here, we present ADataViewer, an interactive platform that facilitates the exploration of 20 cohort datasets with respect to longitudinal follow-up, demographics, ethnoracial diversity, measured modalities, and statistical properties of individual variables. It allows researchers to quickly identify AD cohorts that meet user-specified requirements for discovery and validation studies regarding available variables, sample sizes, and longitudinal follow-up. Additionally, we publish the underlying variable mapping catalog that harmonizes 1196 unique variables across the 20 cohorts and paves the way for interoperable AD datasets. CONCLUSIONS: In conclusion, ADataViewer facilitates fast, robust data-driven research by transparently displaying cohort dataset content and supporting researchers in selecting datasets that are suited for their envisioned study. The platform is available at https://adata.scai.fraunhofer.de/ .

Asunto(s)

Enfermedad de Alzheimer , Inteligencia Artificial , Estudios de Cohortes , Humanos , Tamaño de la Muestra

14.

On the influence of several factors on pathway enrichment analysis.

Mubeen, Sarah; Tom Kodamullil, Alpha; Hofmann-Apitius, Martin; Domingo-Fernández, Daniel.

Brief Bioinform ; 23(3)2022 05 13.

Artículo en Inglés | MEDLINE | ID: mdl-35453140

RESUMEN

Pathway enrichment analysis has become a widely used knowledge-based approach for the interpretation of biomedical data. Its popularity has led to an explosion of both enrichment methods and pathway databases. While the elegance of pathway enrichment lies in its simplicity, multiple factors can impact the results of such an analysis, which may not be accounted for. Researchers may fail to give influential aspects their due, resorting instead to popular methods and gene set collections, or default settings. Despite ongoing efforts to establish set guidelines, meaningful results are still hampered by a lack of consensus or gold standards around how enrichment analysis should be conducted. Nonetheless, such concerns have prompted a series of benchmark studies specifically focused on evaluating the influence of various factors on pathway enrichment results. In this review, we organize and summarize the findings of these benchmarks to provide a comprehensive overview on the influence of these factors. Our work covers a broad spectrum of factors, spanning from methodological assumptions to those related to prior biological knowledge, such as pathway definitions and database choice. In doing so, we aim to shed light on how these aspects can lead to insignificant, uninteresting or even contradictory results. Finally, we conclude the review by proposing future benchmarks as well as solutions to overcome some of the challenges, which originate from the outlined factors.

Asunto(s)

Bases de Datos Factuales , Análisis Factorial , Estudios Longitudinales

15.

Corrigendum to "Machine Learning Based Prediction of COVID-19 Mortality Suggests Repositioning of Anticancer Drug for Treating Severe Cases"[Artificial Intelligence in Life Sciences] 1(2021), 100020.

Linden, Thomas; Hanses, Frank; Domingo-Fernández, Daniel; DeLong, Lauren Nicole; Kodamullil, Alpha Tom; Schneider, Jochen; Vehreschild, Maria J G T; Lanznaster, Julia; Ruethrich, Maria Madeleine; Borgmann, Stefan; Hower, Martin; Wille, Kai; Feldt, Torsten; Rieg, Siegbert; Hertenstein, Bernd; Wyen, Christoph; Roemmele, Christoph; Vehreschild, Jörg Janne; Jakob, Carolin E M; Stecher, Melanie; Kuzikov, Maria; Zaliani, Andrea; Fröhlich, Holger.

Artif Intell Life Sci ; 2: 100032, 2022 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-35156080

RESUMEN

[This corrects the article DOI: 10.1016/j.ailsci.2021.100020.].

16.

Causal reasoning over knowledge graphs leveraging drug-perturbed and disease-specific transcriptomic signatures for drug discovery.

Domingo-Fernández, Daniel; Gadiya, Yojana; Patel, Abhishek; Mubeen, Sarah; Rivas-Barragan, Daniel; Diana, Chris W; Misra, Biswapriya B; Healey, David; Rokicki, Joe; Colluru, Viswa.

PLoS Comput Biol ; 18(2): e1009909, 2022 02.

Artículo en Inglés | MEDLINE | ID: mdl-35213534

RESUMEN

Network-based approaches are becoming increasingly popular for drug discovery as they provide a systems-level overview of the mechanisms underlying disease pathophysiology. They have demonstrated significant early promise over other methods of biological data representation, such as in target discovery, side effect prediction and drug repurposing. In parallel, an explosion of -omics data for the deep characterization of biological systems routinely uncovers molecular signatures of disease for similar applications. Here, we present RPath, a novel algorithm that prioritizes drugs for a given disease by reasoning over causal paths in a knowledge graph (KG), guided by both drug-perturbed as well as disease-specific transcriptomic signatures. First, our approach identifies the causal paths that connect a drug to a particular disease. Next, it reasons over these paths to identify those that correlate with the transcriptional signatures observed in a drug-perturbation experiment, and anti-correlate to signatures observed in the disease of interest. The paths which match this signature profile are then proposed to represent the mechanism of action of the drug. We demonstrate how RPath consistently prioritizes clinically investigated drug-disease pairs on multiple datasets and KGs, achieving better performance over other similar methodologies. Furthermore, we present two case studies showing how one can deconvolute the predictions made by RPath as well as predict novel targets.

Asunto(s)

Reconocimiento de Normas Patrones Automatizadas , Transcriptoma , Algoritmos , Descubrimiento de Drogas/métodos , Reposicionamiento de Medicamentos/métodos , Transcriptoma/genética

17.

STonKGs: a sophisticated transformer trained on biomedical text and knowledge graphs.

Balabin, Helena; Hoyt, Charles Tapley; Birkenbihl, Colin; Gyori, Benjamin M; Bachman, John; Kodamullil, Alpha Tom; Plöger, Paul G; Hofmann-Apitius, Martin; Domingo-Fernández, Daniel.

Bioinformatics ; 38(6): 1648-1656, 2022 03 04.

Artículo en Inglés | MEDLINE | ID: mdl-34986221

RESUMEN

MOTIVATION: The majority of biomedical knowledge is stored in structured databases or as unstructured text in scientific publications. This vast amount of information has led to numerous machine learning-based biological applications using either text through natural language processing (NLP) or structured data through knowledge graph embedding models. However, representations based on a single modality are inherently limited. RESULTS: To generate better representations of biological knowledge, we propose STonKGs, a Sophisticated Transformer trained on biomedical text and Knowledge Graphs (KGs). This multimodal Transformer uses combined input sequences of structured information from KGs and unstructured text data from biomedical literature to learn joint representations in a shared embedding space. First, we pre-trained STonKGs on a knowledge base assembled by the Integrated Network and Dynamical Reasoning Assembler consisting of millions of text-triple pairs extracted from biomedical literature by multiple NLP systems. Then, we benchmarked STonKGs against three baseline models trained on either one of the modalities (i.e. text or KG) across eight different classification tasks, each corresponding to a different biological application. Our results demonstrate that STonKGs outperforms both baselines, especially on the more challenging tasks with respect to the number of classes, improving upon the F1-score of the best baseline by up to 0.084 (i.e. from 0.881 to 0.965). Finally, our pre-trained model as well as the model architecture can be adapted to various other transfer learning applications. AVAILABILITY AND IMPLEMENTATION: We make the source code and the Python package of STonKGs available at GitHub (https://github.com/stonkgs/stonkgs) and PyPI (https://pypi.org/project/stonkgs/). The pre-trained STonKGs models and the task-specific classification models are respectively available at https://huggingface.co/stonkgs/stonkgs-150k and https://zenodo.org/communities/stonkgs. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Reconocimiento de Normas Patrones Automatizadas , Programas Informáticos , Aprendizaje Automático , Procesamiento de Lenguaje Natural , Publicaciones

18.

Using predictive machine learning models for drug response simulation by calibrating patient-specific pathway signatures.

Golriz Khatami, Sepehr; Mubeen, Sarah; Bharadhwaj, Vinay Srinivas; Kodamullil, Alpha Tom; Hofmann-Apitius, Martin; Domingo-Fernández, Daniel.

NPJ Syst Biol Appl ; 7(1): 40, 2021 10 27.

Artículo en Inglés | MEDLINE | ID: mdl-34707117

RESUMEN

The utility of pathway signatures lies in their capability to determine whether a specific pathway or biological process is dysregulated in a given patient. These signatures have been widely used in machine learning (ML) methods for a variety of applications including precision medicine, drug repurposing, and drug discovery. In this work, we leverage highly predictive ML models for drug response simulation in individual patients by calibrating the pathway activity scores of disease samples. Using these ML models and an intuitive scoring algorithm to modify the signatures of patients, we evaluate whether a given sample that was formerly classified as diseased, could be predicted as normal following drug treatment simulation. We then use this technique as a proxy for the identification of potential drug candidates. Furthermore, we demonstrate the ability of our methodology to successfully identify approved and clinically investigated drugs for four different cancers, outperforming six comparable state-of-the-art methods. We also show how this approach can deconvolute a drugs' mechanism of action and propose combination therapies. Taken together, our methodology could be promising to support clinical decision-making in personalized medicine by simulating a drugs' effect on a given patient.

Asunto(s)

Fenómenos Biológicos , Aprendizaje Automático , Algoritmos , Simulación por Computador , Humanos , Medicina de Precisión

19.

DecoPath: a web application for decoding pathway enrichment analysis.

Mubeen, Sarah; Bharadhwaj, Vinay S; Gadiya, Yojana; Hofmann-Apitius, Martin; Kodamullil, Alpha T; Domingo-Fernández, Daniel.

NAR Genom Bioinform ; 3(3): lqab087, 2021 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-34568823

RESUMEN

The past decades have brought a steady growth of pathway databases and enrichment methods. However, the advent of pathway data has not been accompanied by an improvement in interoperability across databases, hampering the use of pathway knowledge from multiple databases for enrichment analysis. While integrative databases have attempted to address this issue, they often do not account for redundant information across resources. Furthermore, the majority of studies that employ pathway enrichment analysis still rely upon a single database or enrichment method, though the use of another could yield differing results. These shortcomings call for approaches that investigate the differences and agreements across databases and methods as their selection in the design of a pathway analysis can be a crucial step in ensuring the results of such an analysis are meaningful. Here we present DecoPath, a web application to assist in the interpretation of the results of pathway enrichment analysis. DecoPath provides an ecosystem to run enrichment analysis or directly upload results and facilitate the interpretation of results with custom visualizations that highlight the consensus and/or discrepancies at the pathway- and gene-levels. DecoPath is available at https://decopath.scai.fraunhofer.de, and its source code and documentation can be found on GitHub at https://github.com/DecoPath/DecoPath.

20.

Towards a global investigation of transcriptomic signatures through co-expression networks and pathway knowledge for the identification of disease mechanisms.

Figueiredo, Rebeca Queiroz; Raschka, Tamara; Kodamullil, Alpha Tom; Hofmann-Apitius, Martin; Mubeen, Sarah; Domingo-Fernández, Daniel.

Nucleic Acids Res ; 49(14): 7939-7953, 2021 08 20.

Artículo en Inglés | MEDLINE | ID: mdl-34197603

RESUMEN

We attempt to address a key question in the joint analysis of transcriptomic data: can we correlate the patterns we observe in transcriptomic datasets to known interactions and pathway knowledge to broaden our understanding of disease pathophysiology? We present a systematic approach that sheds light on the patterns observed in hundreds of transcriptomic datasets from over sixty indications by using pathways and molecular interactions as a template. Our analysis employs transcriptomic datasets to construct dozens of disease specific co-expression networks, alongside a human protein-protein interactome network. Leveraging the interoperability between these two network templates, we explore patterns both common and particular to these diseases on three different levels. Firstly, at the node-level, we identify most and least common proteins across diseases and evaluate their consistency against the interactome as a proxy for their prevalence in the scientific literature. Secondly, we overlay both network templates to analyze common correlations and interactions across diseases at the edge-level. Thirdly, we explore the similarity between patterns observed at the disease-level and pathway knowledge to identify signatures associated with specific diseases and indication areas. Finally, we present a case scenario in schizophrenia, where we show how our approach can be used to investigate disease pathophysiology.

Asunto(s)

Enfermedad/genética , Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes , Predisposición Genética a la Enfermedad/genética , Transducción de Señal/genética , Transcriptoma/genética , Algoritmos , Análisis por Conglomerados , Humanos , Esquizofrenia/genética

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA